feat: inline Arrow IPC support + Python backend (appkit-py)#272
Closed
jamesbroadhead wants to merge 13 commits intodatabricks:mainfrom
Closed
feat: inline Arrow IPC support + Python backend (appkit-py)#272jamesbroadhead wants to merge 13 commits intodatabricks:mainfrom
jamesbroadhead wants to merge 13 commits intodatabricks:mainfrom
Conversation
Some serverless warehouses only support ARROW_STREAM with INLINE disposition, but the analytics plugin only offered JSON_ARRAY (INLINE) and ARROW_STREAM (EXTERNAL_LINKS). This adds a new "ARROW_STREAM" format option that uses INLINE disposition, making the plugin compatible with these warehouses. Fixes databricks#242
Tests verify: - ARROW_STREAM format passes INLINE disposition + ARROW_STREAM format - ARROW format passes EXTERNAL_LINKS disposition + ARROW_STREAM format - Default JSON format does not pass disposition or format overrides
The server-side ARROW_STREAM format added in the previous commit was not exposed to the frontend or typegen: - Add "ARROW_STREAM" to AnalyticsFormat in appkit-ui hooks - Add "arrow_stream" to DataFormat in chart types - Handle "arrow_stream" in useChartData's resolveFormat() - Make typegen resilient to ARROW_STREAM-only warehouses by retrying DESCRIBE QUERY without format when JSON_ARRAY is rejected Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
…compatibility ARROW_STREAM with INLINE disposition is the only format that works across all warehouse types, including serverless warehouses that reject JSON_ARRAY. Change the default from JSON to ARROW_STREAM throughout: - Server: defaults.ts, analytics plugin request handler - Client: useAnalyticsQuery, UseAnalyticsQueryOptions, useChartData - Tests: update assertions for new default JSON and ARROW formats remain available via explicit format parameter. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
When using the default ARROW_STREAM format, the analytics plugin now automatically falls back through formats if the warehouse rejects one: ARROW_STREAM → JSON → ARROW. This handles warehouses that only support a subset of format/disposition combinations without requiring users to know their warehouse's capabilities. Explicit format requests (JSON, ARROW) are respected without fallback. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Previously, _transformDataArray unconditionally called updateWithArrowStatus for any ARROW_STREAM response, which discards inline data and returns only statement_id + status. This was designed for EXTERNAL_LINKS (where data is fetched separately) but broke INLINE disposition where data is in data_array. Changes: - _transformDataArray now checks for data_array before routing to the EXTERNAL_LINKS path: if data_array is present, it falls through to the standard row-to-object transform. - JSON format now explicitly sends JSON_ARRAY + INLINE rather than relying on connector defaults. This prevents the connector default format from leaking into explicit JSON requests. - Connector defaults reverted to JSON_ARRAY for backward compatibility with classic warehouses (the analytics plugin sets formats explicitly). - Added connector-level tests for _transformDataArray covering ARROW_STREAM + INLINE, ARROW_STREAM + EXTERNAL_LINKS, and JSON_ARRAY paths. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Some serverless warehouses return ARROW_STREAM + INLINE results as base64 Arrow IPC in `result.attachment` rather than `result.data_array`. This adds server-side decoding using apache-arrow's tableFromIPC to convert the attachment into row objects, producing the same response shape as JSON_ARRAY regardless of warehouse backend. This abstracts a Databricks internal implementation detail (different warehouses returning different response formats) so app developers get a consistent `type: "result"` response with named row objects. Changes: - Add apache-arrow@21.1.0 as a server dependency (already used client-side) - _transformDataArray detects `attachment` field and decodes via tableFromIPC - Connector tests use real base64 Arrow IPC captured from a live serverless warehouse, covering: classic JSON_ARRAY, classic EXTERNAL_LINKS, serverless INLINE attachment, data_array fallback, and edge cases Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Python implementation of the AppKit backend using FastAPI, providing the same HTTP API surface as the TypeScript version for all plugins: analytics (SSE query streaming), files (11 endpoints), and genie (3 SSE endpoints). Includes full test suite (48 unit + 41 integration tests), SSE streaming infrastructure with reconnection support, contextvars-based user context, interceptor chain (retry/timeout/cache), and Databricks SDK connector wiring. Co-authored-by: Isaac
- Fix path traversal in SPA static file serving (use resolve() + prefix check) - Fix upload endpoint OOM: stream body with running size counter - Fix CacheInterceptor to actually use TTL (was storing forever) - Fix StreamManager reconnection: persist EventRingBuffer per stream_id - Fix _UserContextProxy: only wrap async methods, leave sync methods alone - Fix _load_query path traversal: reject /, \, .. in query_key - Fix Content-Disposition header injection: sanitize filename - Fix format_buffered_event: apply sanitize_event_type on replay - Fix ruff target-version to match requires-python (py312) - Fix __main__.py: load dotenv, use APPKIT_HOST env var - Add abort_all() implementation to StreamManager Co-authored-by: Isaac
…es, path traversal - Fix OBO: create per-request WorkspaceClient from x-forwarded-access-token instead of reusing global service-principal client for all routes - Fix ARROW format: use EXTERNAL_LINKS disposition and emit arrow event with statement_id (matching TS FORMAT_CONFIGS) - Fix SQL connector: check for FAILED/CANCELED/CLOSED states after polling and raise with error message instead of returning empty result - Fix FilesConnector.resolve_path: reject path traversal (..) sequences - Update all file/genie endpoints to use per-request user client Co-authored-by: Isaac
…aceId - Add pyarrow-based Arrow IPC attachment decoding (decode_arrow_attachment) matching TS _transformArrowAttachment for serverless warehouse support - Implement get_arrow_data: download external link chunks via httpx - Use transform_result() in analytics handler for unified result processing - Add maxSize enforcement to FilesConnector.read() - Auto-inject workspaceId parameter in process_query_params when query references :workspaceId - Add pyarrow and httpx to runtime dependencies Co-authored-by: Isaac
…→ ARROW) Mirrors the TS _executeWithFormatFallback: when the default ARROW_STREAM format is rejected by a warehouse (classic warehouses don't support INLINE + ARROW_STREAM), automatically falls back through JSON then ARROW. Verified working against live Databricks SQL Warehouse. Co-authored-by: Isaac
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
result.attachmentinstead ofdata_array. This PR decodes them transparently in the SQL connector, with automatic format fallback (ARROW_STREAM → JSON → ARROW).packages/appkit-py/): 100% API-compatible Python implementation of the AppKit backend using FastAPI. Serves all 17+ endpoints with identical SSE wire format, enabling Python-based Databricks Apps.Python backend highlights
contextvars-based user context (OBO) — per-requestWorkspaceClientfromx-forwarded-*headersdatabricks-sdk+pyarrowfor live Databricks queries with Arrow IPC decoding_executeWithFormatFallbackSecurity (addressed via ACE multi-model review)
Test plan
pnpm test)Related PRs
row.toJSON()This pull request was AI-assisted by Isaac.